[smoke][bugfix] moe_init_routing_v2 active_expert_range use int type by shenchuxiaofugui · Pull Request #5521 · vllm-project/vllm-ascend

shenchuxiaofugui · 2025-12-30T09:26:15Z

What this PR does / why we need it?

The float kernel of MOE_init_routing_v2 in the dispatch allgather operation does not support tensor format for active_expert_range; it only supports int.
PR5311 To unify the variables local_num_experts and self.local_num_experts, self.local_num_experts was used consistently, which led to the subsequent integer type parameter being converted to a tensor type.

Does this PR introduce any user-facing change?

How was this patch tested?

gsm8k | exact_match,strict-match: ground_truth=0.89 | measured=0.8939 | success=✅
gsm8k | exact_match,flexible-extract: ground_truth=0.85 | measured=0.856 | success=✅
ceval-valid | acc,none: ground_truth=0.84 | measured=0.8373 | success=✅
Model Parameters:
{'pretrained': 'Qwen/Qwen3-30B-A3B', 'tensor_parallel_size': 2, 'dtype': 'auto', 'trust_remote_code': False, 'max_model_len': 4096, 'gpu_memory_utilization': 0.6, 'enable_expert_parallel': True}

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@45c1ca1

gemini-code-assist

Code Review

This pull request addresses a bug where num_local_experts could be a torch.Tensor, causing a type error in the npu_moe_init_routing_v2 kernel which expects an integer for its active_expert_range parameter. The fix correctly handles this by checking if num_local_experts is a tensor and extracting its value with .item(), or casting it to an integer otherwise. This change is correct and effectively resolves the issue.

github-actions · 2025-12-30T11:44:39Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: [feature] mooncake support pcp/dcp in common conditions (vllm-project#5224) [Bugfix] Fix mm_merge (vllm-project#5249) [Main2Main] Upgrade vllm commit to 1230 (vllm-project#5495) [Feature] Refactor PCP &DCP related code (vllm-project#5214) [main][test] Refactor the mtp and eagle test case (vllm-project#5326) [smoke][bugfix] moe_init_routing_v2 active_expert_range use int type (vllm-project#5521) [2/N] Upgrade nightly doc (vllm-project#5534) [Doc] Add new contributors. (vllm-project#5537) [3/N][Nightly] Move ops tests to nightly (vllm-project#5538)

…llm-project#5521) ### What this PR does / why we need it? The float kernel of MOE_init_routing_v2 in the dispatch allgather operation does not support tensor format for active_expert_range; it only supports int. PR5311 To unify the variables `local_num_experts` and `self.local_num_experts`, `self.local_num_experts` was used consistently, which led to the subsequent integer type parameter being converted to a tensor type. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? gsm8k | exact_match,strict-match: ground_truth=0.89 | measured=0.8939 | success=✅ gsm8k | exact_match,flexible-extract: ground_truth=0.85 | measured=0.856 | success=✅ ceval-valid | acc,none: ground_truth=0.84 | measured=0.8373 | success=✅ Model Parameters: {'pretrained': 'Qwen/Qwen3-30B-A3B', 'tensor_parallel_size': 2, 'dtype': 'auto', 'trust_remote_code': False, 'max_model_len': 4096, 'gpu_memory_utilization': 0.6, 'enable_expert_parallel': True} - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: shenchuxiaofugui <1311027364@qq.com>

…llm-project#5521) ### What this PR does / why we need it? The float kernel of MOE_init_routing_v2 in the dispatch allgather operation does not support tensor format for active_expert_range; it only supports int. PR5311 To unify the variables `local_num_experts` and `self.local_num_experts`, `self.local_num_experts` was used consistently, which led to the subsequent integer type parameter being converted to a tensor type. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? gsm8k | exact_match,strict-match: ground_truth=0.89 | measured=0.8939 | success=✅ gsm8k | exact_match,flexible-extract: ground_truth=0.85 | measured=0.856 | success=✅ ceval-valid | acc,none: ground_truth=0.84 | measured=0.8373 | success=✅ Model Parameters: {'pretrained': 'Qwen/Qwen3-30B-A3B', 'tensor_parallel_size': 2, 'dtype': 'auto', 'trust_remote_code': False, 'max_model_len': 4096, 'gpu_memory_utilization': 0.6, 'enable_expert_parallel': True} - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…llm-project#5521) ### What this PR does / why we need it? The float kernel of MOE_init_routing_v2 in the dispatch allgather operation does not support tensor format for active_expert_range; it only supports int. PR5311 To unify the variables `local_num_experts` and `self.local_num_experts`, `self.local_num_experts` was used consistently, which led to the subsequent integer type parameter being converted to a tensor type. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? gsm8k | exact_match,strict-match: ground_truth=0.89 | measured=0.8939 | success=✅ gsm8k | exact_match,flexible-extract: ground_truth=0.85 | measured=0.856 | success=✅ ceval-valid | acc,none: ground_truth=0.84 | measured=0.8373 | success=✅ Model Parameters: {'pretrained': 'Qwen/Qwen3-30B-A3B', 'tensor_parallel_size': 2, 'dtype': 'auto', 'trust_remote_code': False, 'max_model_len': 4096, 'gpu_memory_utilization': 0.6, 'enable_expert_parallel': True} - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: shenchuxiaofugui <1311027364@qq.com>

…llm-project#5521) ### What this PR does / why we need it? The float kernel of MOE_init_routing_v2 in the dispatch allgather operation does not support tensor format for active_expert_range; it only supports int. PR5311 To unify the variables `local_num_experts` and `self.local_num_experts`, `self.local_num_experts` was used consistently, which led to the subsequent integer type parameter being converted to a tensor type. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? gsm8k | exact_match,strict-match: ground_truth=0.89 | measured=0.8939 | success=✅ gsm8k | exact_match,flexible-extract: ground_truth=0.85 | measured=0.856 | success=✅ ceval-valid | acc,none: ground_truth=0.84 | measured=0.8373 | success=✅ Model Parameters: {'pretrained': 'Qwen/Qwen3-30B-A3B', 'tensor_parallel_size': 2, 'dtype': 'auto', 'trust_remote_code': False, 'max_model_len': 4096, 'gpu_memory_utilization': 0.6, 'enable_expert_parallel': True} - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@45c1ca1 Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

gemini-code-assist Bot reviewed Dec 30, 2025

View reviewed changes

shenchuxiaofugui changed the title ~~[smoke][bugfix] moe_init_routing_v2 use int type~~ [smoke][bugfix] moe_init_routing_v2 active_expert_range use int type Dec 30, 2025

shenchuxiaofugui force-pushed the smoke_1230 branch from ceb69aa to 16211fe Compare December 30, 2025 10:56

github-actions Bot added the module:ops label Dec 30, 2025

vllm-ascend-ci added accuracy-test enable all accuracy test for PR ready-for-test start test by label for PR labels Dec 30, 2025

zhangxinyuehfad mentioned this pull request Dec 30, 2025

Revert "[EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285） (#5311)" #5506

Closed

shenchuxiaofugui force-pushed the smoke_1230 branch from 16211fe to fa14124 Compare December 30, 2025 12:38

MengqingCao added the ready read for review label Dec 30, 2025

[smoke][bugfix] moe_init_routing_v2 use int type

d2dca75

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>

shenchuxiaofugui force-pushed the smoke_1230 branch from fa14124 to d2dca75 Compare December 30, 2025 13:25

vllm-ascend-ci added accuracy-test enable all accuracy test for PR and removed accuracy-test enable all accuracy test for PR labels Dec 30, 2025

wangxiyuan approved these changes Dec 31, 2025

View reviewed changes

wangxiyuan merged commit bdc721d into vllm-project:main Dec 31, 2025
42 of 48 checks passed

shenchuxiaofugui deleted the smoke_1230 branch January 12, 2026 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[smoke][bugfix] moe_init_routing_v2 active_expert_range use int type#5521

[smoke][bugfix] moe_init_routing_v2 active_expert_range use int type#5521
wangxiyuan merged 1 commit intovllm-project:mainfrom
shenchuxiaofugui:smoke_1230

shenchuxiaofugui commented Dec 30, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

github-actions Bot commented Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

shenchuxiaofugui commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions Bot commented Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shenchuxiaofugui commented Dec 30, 2025 •

edited

Loading